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Abstract 

We consider a dynamic programming problem with arbitrary state space and bounded 
rewards. Is it possible to define in an unique way a limit value for the problem, where 
the "patience" of the decision-maker tends to infinity ? We consider, for each evalu- 
ation 9 (a probability distribution over positive integers) the value function vg of the 
problem where the weight of any stage t is given by 6t, and we investigate the uniform 
convergence of a sequence {vgk)k when the "impatience" of the evaluations vanishes, in 
the sense that Y^t \@t~ ^t+il ~~ *"fc->oo 0. We prove that this uniform convergence happens 
if and only if the metric space {v@k, k > 1} is totally bounded. Moreover there exists a 
particular function v*, independent of the particular chosen sequence (0 k )k, such that 
any limit point of such sequence of value functions is precisely v* . Consequently, while 
speaking of uniform convergence of the value functions, v* may be considered as the 
unique possible limit when the patience of the decision-maker tends to infinity. The 
result applies in particular to discounted payoffs when the discount factor vanishes, 
as well as to average payoffs where the number of stages goes to infinity, and also to 
models with stochastic transitions. We present tractable corollaries, and we discuss 
counterexamples and a conjecture. 

Keywords: dynamic programming, average payoffs, discounted payoffs, general evalua- 
tions, limit value, vanishing impatience, uniform convergence of the values. 



1 Introduction 

We consider a dynamic programming problem with arbitrary state space Z and 
bounded rewards. Is it possible to define in an unique way a possible limit value 
for the problem, where the "patience" of the decision-maker tends to infinity ? 

For each evaluation (probability distribution over positive integers) 9 = (9t)t>i, 
we consider the value function v q of the problem where the initial state is arbitrary 
in Z and the weight of any stage t is given by 9 t . The total variation of 9, that we 
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also wall the impatience of 9, is denned by TV (6) = \@t+i — Qt\- For instance, 
for each positive integer n the evaluation 9 = (1/n, 1/n, 0, 0, ...) induces the 
value function v n corresponding to the maximization of the mean payoff for the 
first n stages; and for any A in (0, 1] the evaluation 9 = (A(l — A)' _1 )i induces the 
discounted value function v\. 

A well known theorem of Hardy and Littlewood (see e.g. Lippman, 1969) im- 
plies that for an uncontrolled problem, the pointwise convergence of (v n ) n , when 
n goes to infinity, and of (v\)\, when A goes to 0, are equivalent, and that in 
case of convergence both limits are the same. However, Lehrer and Sorin (1992) 
provided an example of a dynamic programming problem where (v n ) n and {v\)\ 
have different pointwise limits. But they also proved that the uniform conver- 
gence of (v n ) n and of (v\)\ are equivalent, with equality of the limit in case of 
convergence. And Sorin and Monderer (1993) extended this result to families of 
evaluations satisfying some conditions. Mertens and Neyman (1982) proved that 
when the family (v~\)\ not only uniformly converges but has bounded variation, 
then the dynamic programming problem has a uniform value, in the sense that 
for all initial state z and e > 0, there exists a play with mean payoffs from stage 
1 to stage T at least v — e provided T is large enough (see also Lehrer Monderer 
1994 and Sorin Moderer 1993 for proofs that the uniform convergence of {v\)\ or 
(v n ) n does not imply the existence of the uniform value of the problem). In this 
case of existence of a uniform value, one can show that all value functions vg are 
close to v *, whenever 9 is a non increasing evaluation with small 9\. The reason 
is that whenever 9 is non increasing, the ^-payoff of a play can be expressed as a 
convex combination of the Cesaro values (v n )n- 

In the present paper, we investigate the uniform convergence of sequences 
(vgk)k when the "impatience" of the evaluations vanishes, in the sense that 

\®t ~ @t+i\ ~~ ^fc^oo 0. We will prove in theorem 12.51 that this uniform con- 
vergence happens if and only if the metric space {vgk, k > 1} (with the distance 
between functions given by the sup of their differences), is totally bounded. More- 
over the uniform limit, whenever it exists, can only be the following function, 
which is independent of the particular chosen sequence 

v* = inf sup v mi e, 
ee m > 

where for each evaluation 9 = (9 t )t>i, the evaluation m, 9 is defined as the evalu- 
ation with weight for the first m stages and with weight 9t- m for stages t > m. 
Consequently, while speaking of uniform convergence of the value functions when 
the patience of the decision-maker tends to infinity, v* can be considered as the 
unique possible limit value. We also give simple conditions on the state space, 
the payoffs and the transitions (mainly compactness, continuity and non expan- 
siveness) implying the uniform convergence of such value functions. 

The paper is organized as follows: section 2 contains the model and the main 
results, which are shown to extend to the case of stochastic transitions. Section 
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3 contains a few examples and counterexamples and section 4 contains the proof 
of theorem 12.51 In the last section we formulate the following conjecture, which 
is shown to be true for uncontrolled problems: does the uniform convergence 
of (vn)n, or equivalently of (v\)\, implies the general convergence of the value 
functions, in the sense that: Ve > 0, 3a > 0, \/9 G 6 s.t. TV (9) < a, \\v e -v*\\ < e 
? 

2 Model and results 

2.1 General values in dynamic programming problems 

We consider a dynamic programming problem given by a non empty set of states 
Z, a correspondence F with non empty values from Z to Z, and a mapping r 
from Z to [0, 1]. Z is called the set of states, F is the transition correspondence 
and r is the reward (or payoff) function. An initial state zq in Z defines the 
following dynamic programming problem: a decision maker, also called player, 
first has to select a new state Z\ in F(zq), and is rewarded by r(zi). Then he has 
to choose z 2 in F(zi), has a payoff of r(z 2 ), etc... The decision maker is inter- 
ested in maximizing his "long-term" payoffs, for whatever it means. From now 
on we fix T = (Z, F, r), and for every state z we denote by T(zq) = (Z, F, r, z ) 
the corresponding problem with initial state zq. For zq in Z, a play at Zq is a 
sequence s = (zi, z t , ...) G Z°° such that: Vt > l,z t G F(zt_i). We denote by 
5 , (^ ) the set of plays at z , and by S = U ZQe zS(zo) the set of all plays. The set 
of bounded functions from Z to M is denoted by V, and for v and v' in V we use 
the distance doo(v,v') = sup 2gZ \v(z) — ^'(2:) | . 

Cesaro values. For n > 1 and s = (z t )t>i G S, the average payoff of the play s 
up to stage n is defined by: 7n(s) = - X]™=i r ( ,2 i)- ^- n< ^ ^ ne n_s t a g e average value 

of T(zq) is: fn(-Zo) = SU P ln(s). By the Bellman-Shapley recursive formula, 

ses(zo) 

for all n and z we have: n t%(z) = sup 2 , gF ( 2 ) (r(z') + (n — 1) u^zr(^')) • We also 
have \vyt{z) — sup z , eF ^Vn(z')\ < ^, and a pointwise limit of (%) n should satisfy 
f (2) = sup 2 / gF( - z ) f (2') for all z. 

Discounted values. Given A G (0,1], the A-discounted payoff of a play s = 
(z t )t>i is 7a(s) = ^^^i(l — A)* _1 r(z t ), and the A-discounted value at the 
initial state z is v\(z ) = swp seS , ZQ \ J\(s). It is easily proved that v\ is the 
unique mapping in V satisyfing the fixed point equation : Wz G Z,v\(z) = 
sup 2 , gF(2) (A r(z') + (1 - A) v x (z')) . It implies \v\(z) - sup 2 , gF(z) v\(z')\ < A, and 
a pointwise limit of (v\)\ should also satisfy v(z) = swp z 'eF(z) v { z ') ^ or an z - 

General values. We denote by the set of probability distributions over positive 
integers. An element 9 = {9 t )t>i in is called an evaluation. 

Definition 2.1. 
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The O-payoff of a play s = (zt) t >i is 7<?(s) = YSLi °tr(z t ), 

and the 9 -value oJY{zq) is vq{zq) = sup je(s). 

ses(z ) 

For each stage t we denote by 5t the Dirac mass on stage t and by n the 
Cesaro evaluation (1/n, 1/n, 0, 0, ...) = (1/n) Ylt=i so ^ na ^ ^ ne notation 

for 9 = n coincide with the Cesaro- value v n , also written v n . It is easy to 
see that for each evaluation 9, the Bellman recursive formula can be written as 
follows: 

v e {z) = sup (9 1 r(z') + (l-9 1 )v e+ (z > )), 

z'eF(z) 

where if 9± < 1, the "shifted" evaluation 9 + is defined as (jzjj-)t>i- 



Lemma 2.2. For all evaluation 9 in 6 and state z in Z , 

\v e (z)- sup v e {z')\<9 1 + y2\ e t-dt-i\- 

z'eF(z) t > 2 

Proof: Consider any Z\ G F(z), and for £ > a play s = (z 2 , z 3 , ) in 5(2:1^ 
such that 7e(s) > ^(^i) — £■ We have: 

oo 

v e (z) > 9 l r(z l ) + J2 d ^ 

t=2 

oo oo 

> 6ir ( Zl ) + ^ t _ ir (^) + - et-Mzt) 

t=2 t=2 

oo 

> v e ( Zl )-e-J2\ d t-0t 



n-i\ 



t=2 

Conversely, choose s = (21,22, ■■■) in S(z ) such that je(s) > vg(z) — e. 

oo oo 

v e (z) < e + 9 1 r(z 1 ) + ]T t _ir(^) + J> t - t _i)r(^) 

t=2 t=2 
oo 

< e + ^i +^(zi) + J^|0t -0t-i|. 



t=2 



□ 



Definition 2.3. The total variation of an evaluation 9 = (6 t )t>i is 

oo 

TV(9) = J2\8t+i-9 t \. 



t=i 
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We have sup t 9 t < TV (9) < 2. In the case of a Cesaro evaluation 9 = (1/n, 1/n, 
0, 0, ...), we have TV (9) = 1/n. For a discounted evaluation 9 = (A(l — A)* _1 ) t >i, 
we have TV (6) = A. A small TV (9) corresponds to a patient evaluation, and 
sometimes we will refer to TV (9) as the impatience of 9. We will consider here 
limits when TV (9) goes to zero, generalizing the cases where n — > oo or A — > 0. 
Notice that if an evaluation 9 is non increasing, i.e. satisfies 9 t +i < #t for all t, 
we have that TV (9) = 9±. In the case of a sequence of non increasing evaluations 
(9 k ) k , the condition TV(9 k ) > is equivalent to the condition sup 4>1 9 k > 

0. We always have: 



9 1 )Y J \0t-et\<9 1 + TV{9), 



so if TV(9) is small, the /^-distance between 9 and the shifted evaluation 9 + is 
also small. Notice also the following inequalities: for any given T, denote by 9(T) 
the arithmetic mean of 0i,..., 0t- We have for all t — 1, ...,T: 



T-l 



t'=i 



So if TV (9) is small, then for all T and t <T, the weight t is close to the average 
9{T). 

Given an evaluation 9 and m > 0, we write f m) e for the value function associ- 
ated to the evaluation & = J2t^i 0t$m+t- The following function will play a very 
important role in the sequel: 

Definition 2.4. Define for all z in Z , 



v*(z) = inf sup v m fi{z). 
"ee m > 



2.2 Main results 

We now state the main result of this paper. Recall that a metric space is totally 
bounded, or precompact, if for all e > it can be covered by finitely many balls 
with radius e. 

Theorem 2.5. Let (9 k ) k > 1 be a sequence of evaluations such that TV \9 k ) > 0. 

We have for all z in Z: 

v*(z) = inf supv mfi k(z). 
fe>i m >o 

Moreover, the sequence (vgk)k uniformly converges if and only if the metric space 
({v k,k > lj^oo) is totally bounded. And in case of convergence, the limit value 
is v*. 
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This theorem generalizes theorem 3.10 in Renault, 2011, which was only deal- 
ing with Cesaro evaluations^. In particular, there is a unique possible limit point 
for all sequences (v d k) k such that TV{6 k ) > 0, and consequently any (uni- 

k— >oo 

form) limit of such sequence is v*. Notice that this is not true if we replace 
uniform convergence by pointwise convergence: even for uncontrolled problems, 
it may happen that several limit points are possible. As an immediate corollary 
of theorem 12.51 when Z is finite the sequence (vQk)k is bounded and has a unique 
limit point, so converges to v*. 

Corollary 2.6. Assume that Z is endowed with a distance d such that: a) (Z, d) is 
a precompact metric space, and b) the family (ve)eee is uniformly equicontinuous. 
Then there is general uniform convergence of the value functions to v* , i.e. 

We > 0,3a > O,V0 G 6 s.t. TV \9) < a, \\v e -v*\\ < e. 

The proof of corollary 12.61 from theorem 12.51 follows from 1) Ascoli's theorem, 
and 2) the fact that the convergence of {v d k) k to v* for each sequence of evalua- 
tions such that TV(6 k ) > is enough to have the general uniform convergence 

fe— >oo 

of the value functions to v*. 

Corollary 2.7. Assume that Z is endowed with a distance d such that: a) (Z,d) 
is a precompact metric space, b) r is uniformly continuous, and c) F is non 
expansive, i.e. Wz G Z,Wz' G Z,\fz\ G F(z),3z[ G F(z') s.t. d(zi,z[) < d(z,z'). 
Then we have the same conclusions as corollary \2.6l there is general uniform 
convergence of the value functions to v* , i.e. 

We > 0,3a > 0,W9 G s.t. TV (9) < a, \\v e - v*\\ < e. 

Proof of corollary 12.71 One can proceed as in the proof of corollary 3.9 in 
Renault, 2011. Given two states z and z', one can construct inductively from 
each play s = (z t )t>i at z a play s = (z r t ) t >i at z' such that d(z t , z' t ) < d(z, z') 
for all t. Regarding payoffs, we introduce the modulus of continuity e of r by: 

e(a) = sup Z)Z , s . t . d(j5iZ ,)< Q \r(z) - r(z')\ for each a > 0. 
So \r(z) — r(z')\ < e(d(z, z')) for each pair of states z, z', and e is continuous at 0. 
Using the previous construction, we obtain that for z and z' in Z, for all k > 1, 
\vgk(z) — Vgk(z')\ < e(d(z,z')). In particular, the family (vgk)k>i is uniformly 
continuous, and corollary 12.61 gives the result. □ 

A completely different proof of corollary 12. 7\ with another expression for the 
limit value v*, can be found in theorem 3.9 of Renault Venel 2012. 



this paper it is also proved that if the Cesaro values (w s ) n uniformly converge then the 
limit can only be inf n >i swp m>0 v m ,n which in this case is also equal to sup m>0 inf n >i w m ,n- 
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2.3 Extension to stochastic transitions 

We generalize here theorem 12.51 to the case of stochastic transitions. We will 
only consider transitions with finite support, and given a set X we denote by 
Af(X) the set of probabilities with finite support over X. We consider now 
stochastic dynamic programming problems of the following form. There is an 
arbitrary non empty set of states X, a transition given by a multi- valued mapping 
F : X =4 Af(X) with non empty values, and a payoff (or reward) function 
r : X [0,1]. The interpretation is that given an initial state x$ in X, a 
decision-maker has to choose a probability with finite support u\ in F(xq), then 
xi is selected according to u% and there is a payoff r{x\). Then the player has 
to select m 2 in F(xi), x 2 is selected according to u% and the player receives the 
payoff r(x 2 ), etc... 

Following Maitra and Sudderth (1996), we say that T = (X,F,r) is a Gam- 
bling House. We assimilate an element x in X with its Dirac measure 5 X in A(X), 
we write Z = Af(X) and an element in Z is written u = ^2 xe x u ( x )^- in case 
the values of F only consist of Dirac measures on X, we are in the previous case 
of a dynamic programming problem. 

We linearly extend r and F to Af(X) by defining for each u in Z, the payoff 

r ( M ) = J2xex r ( x ) u ( x ) an d the transition F(u) = {J2xex u ( x )f( x )' s -t- f '■ X ~ * 
Z and f(x) G F(x) Wx G X}. A play at xq is a sequence a = (u\, u t , ...) G Z°° 
such that u\ G F(xo) and -u t+ i G F(u t ) for each t > 1, and we denote by S(xo) the 
set of plays at x . Given an evaluation 9, the 0-payoff of a play cr = u t , ...) 

is defined as: 7e(cr) = 52 t>1 6 t r{u t ), and the 6*-value at x is: 

w (x o ) = sup 7e (a). 
°-eE(>o) 

is by definition a mapping from X to [0, 1], and we define as before, for all x 
inX: 

v*(x) = inf sup v m ,e(x). 

e £® TO>0 

Theorem 1 easily extends to this context. 

Theorem 2.8. Let (6 |fc )fc> 1 be a sequence of evaluations with vanishing total vari- 
ation, i.e. such that TV (9 k ) > 0. We have: 

Wx G X, v*(x) = inf sup v m 6 k(x). 

fc>l m>0 

Moreover, the sequence {v e k)k uniformly converges if and only if the metric space 
({v d k,k > 1}, doo) is totally bounded. And in case of convergence, the limit value 
is v*. 

Proof. Consider the deterministic dynamic programming problem T = (Z, F, r). 
For any evaluation 6, the associated 9- value function vg : Z — > [0, 1] is the affine 
extension of v $ : X — > [0, 1]. We put, as in definition 12. 4[ for all z in Z: 

v*(z) = inf sup v m e(z). 
eee m > 
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Notice that as an "infsup" of affine functions, there is no reason a priori for v* to 
be affine. However, the restriction of v* to X is v*. 

Consider now a sequence (6 k )k>i of evaluations with vanishing total variation. 
Applying theorem 12.51 to T, we first obtain that for all x in X: 

v*(x) = inf sup v m 8 k(x). 



k>l 



m>0 



Moreover, given two evaluations 9 and 9', we have (using the same notation d c 
for the distances on [0, 1} X and on [0, l] z ): 

doo{v0,V0>) = sup \v e (z) - V 6 '{z)\, 



= sup I / veip) - vo'{p)du{p)\, 
z&z J p ex 

= dooOe,^/). 

Consequently, ({v e k,k > l},^) is totally bounded if and only if ({v 6 ik,k > 
1}, dm) is, and this completes the proof. □. 



3 Examples 

The first very simple example shows that, even when the set of states is finite, 
it is not possible to obtain the conclusions of theorem 12.51 or corollaries 12.61 and 
12.71 with sequences of evaluations satisfying the weaker convergence condition: 
sup t >! 9 k t — > k ^ oa 0. 

Example 3.1. Consider the following dynamic programming problem with 2 

states: Z = {^o,^!}, F(zq) = {zi}, F(zi) = {z }, with payoffs r(z ) = and 
r(z\) = 1. We have a deterministic Markov chain, so that any play alternates 
forever between zq and z\. Define for each k the evaluations 9 k = | Ylt=i $2t-i 
and 9' k = jYlt=i^t- We have VQk(zo) = Vqu,(zi) = 1, and v e k{zi) = v e ik(zo) = 
for all k. Define now v k as 9 k when k is even, and 9' k when k is odd. The 
evaluation v k satisfies sup 4 v k = | — >k^>-oc 0, however (v v k(zo))k and (v v k(zi))k 
do not converge. □ 



Lehrer and Sorin (1992) proved that the uniform convergence of the Cesaro 
values (i%) n >i was equivalent to the uniform convergence of the discounted values 
{v\)\e(p,i\- The following example shows that this property does not extend to 
general evaluations: given 2 sequences of TV- vanishing evaluations (9 k )k>i and 
(9' h )k>i, the uniform convergence of (v e k)k and (vgrk)k are not equivalent. 

Example 3.2. In this example, (t>n)„ will pointwise converges to the constant 
1/2 whereas for a particular sequence of evaluations {9 k )k with total variation 
going to zero, we will have (v d k) k (z) = 1 for all k and z. 
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We construct a dynamic programming problem denned via a rooted tree T 
without terminal nodes (as in Sorin Monderer 1992 or Lehrer Monderer 1994). T 
has countably many nodes, and the payoff attached to each node is either or 1. 

We first construct a tree Ti, with countably many nodes and root Zq. Each 
node has an outgoing degree one, except the root which has countably many 
potential successors Zi, z 2 ,..., z n ... On the n th branch starting from z n , each node 
has a unique successor and the payoffs starting from z n are successively for n 
stages, then 1 for n stages, then until the end of the play. 




We now define T inductively from T\. T 2 is obtained from 7\ by attaching 
the tree T\ to each node of Ti\{z }. This means that for each node z of Ti\{z } 
we add a copy of the tree T\ where z plays the role of the root of 7\. And for 
each /, the tree 7} is obtained by attaching the tree T\ to each node of 7}_i\7]_2. 
Finally, T is defined as the union lJz>i^f- 

Starting from z , any sequence of n consecutive payoffs of 1 has to be preceeded 
by n consecutive payoffs of 0, so v n (z ) < 1/2 for each n > 1, and for each node 
z and even integer n it is possible to get exactly n/2 payoffs of followed by n/2 
payoffs of 1. Consequently one can deduce that (v n (z)) n converges to 1/2 for each 
state z. But sup z&z v n (z) = 1 for each n, and the convergence is not uniform. 

Consider now for any k, the evaluation 9 k = (0, ...0, j^, j^, 0, ...) = YltLi ^t+K- 
We have Vgk(z) = 1 for all k and z, so (vgk)k uniformly converges to v* — 1. □ 

Example 3.3. The condition ({vg,8 G Q},^) totally bounded is satisfied with 
the hypotheses of corollary 12.61 or corollary 12. 7\ and is sufficient to obtain the 
general uniform convergence of the value functions. This condition turns out to 
be stronger than having ({vgk, k > 1}, g^) totally bounded for every sequence of 
evaluations with vanishing TV. 

In the following example, there is no control and the state space Z is the set 
of all integers, with transition given by the shift: F(z) = {z + 1}. The payoffs 
are given by r(0) = 1 and r(z) = for all z ^ 0. 
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For all evaluations 9 = (9 t )t>i, we have sup ze z v o( z ) = sn Pt@t, so we have 
general uniform convergence of the value functions to v* = 0. 

For all positive t, we can consider the evaluation given by the Dirac measure 
on t. We have vs t (—i) = 1, and vg t (z) = if z ^ —t. The set {vs t ,t > 1} is not 
totally bounded. □ 



4 Proof of theorem 2.5 



We start with a few notations and definitions. We define inductively a sequence 
of correspondences (F n ) n from Z to Z, by F°(z) = {z} for every state z, and 
Vn > 0, F n+1 = F n o F (the composition being defined by Go H(z) = {z" G 
Z,3z' G H(z),z" G G(z')}). F n (z) represents the set of states that the decision 
maker can reach in n stages from the initial state z. We also define for every state 
z> G m (z) = \J™ =0 F n (z) and G°°(z) = \J™ =Q F n (z). The set G°°(z) is the set of 
states that the decision maker, starting from z, can reach in a finite number of 
stages. 

For all 9 in G, m > and initial state z, we clearly have: 



v m ,e(z) = sup fe(V) = SU P 



CO 



In the sequel, we fix a sequence of evaluations (9 k )k>i such that TV{9 k ) )■ 0. 

k— ¥00 

Lemma 4.1. For all m > and z in Z , 

liminf sup v m6 k(z) = liminf Vgk(z). 

k m<mo k 

Proof: For each k, we have Vgk(z) > v ljd k(z) — 9\ — TV($ fc ) by lemma |2T2| so 
> ^l.e*^- 2 ) ~~ 27V(# fc ). Iterating, we obtain that: 
v e k(z) > sup m < mo v mfik (z) - 2m TV(9 k ). □ 

A key result is the following proposition, which is true for all evaluations 9. 
Proposition 4.2. For all evaluations 9 in and initial state z in Z , 

sup vg(z') > limsupt>0fc(,2). 

z'eG°°{z) k 

Proof of proposition 14.21 z and 9 being fixed, put (3 = swp z , eGX / z \Ve(z'). Fix 
e G (0, 1], there exists T such that Y1u=t +i ®t < £ > an d fix T t > T /e. 

For any play s = (z\, z t , ...) in S(z), we have by definition of /3 that for all 
=t+i @t-Tr{z t ) < (3. Let m be a non negative integer, we define: 

(m+l)Ti-l oo 

A ™ = zZ 9 t-Tr(z t ) < TiP. 

T=mT 1 t=T+l 



10 



We obtain: 
and 



oo min{(m+l)Ti-l,t-l} 

A m = E @t—Ti 

t=mT 1 + l T=mT 1 
(m+l)Ti 

> Yl r{z t ){e 1 + ... + e t . mTl ), 

t=mT 1 + l 

(m+l)Ti 

> (l-s) Yl r ^)> 

t=T +mTi 

((m+l)T x 
r(z t )-(T -l) 
t=l+mTi 



Ti/3 > (1 - e)T l7mTl , Tl (a) - (1 - e)(T - 1), 



We now consider 7#fc(s) for k large. We compute Y^^Li @t r ( z t) by dividing the 
stages into blocks of length T\. For each m > 0, let 8 k (m) be the Cesaro- average 
of 0^, where t ranges from mTi + 1 to (m + l)Ti. Notice that for all such £, we 

have \6 k t - ¥(m)\ < Y}™=X+i I** ~ e *+A- We have: 

(m+l)Ti (m+l)Ti (m+l)Ti 

E *M*) ^ E %) r W+ E i^-^hk^)' 

t=mTi+l t=raTi+l i=mTi+l 

(m+l)Ti-l 

< ^Ht^^W + Ti E 

t=mTi+l 

(m+l)Ti-l 



/ + ^TlT, _1_1 



t=mTi+l 

where the last inequality follows from equation (CQ). Summing up over m, we 
obtain: 

l 6k {s)<^— £ +e + T x TV{e k ). 
Consequently, lim sup fe Vgk (z) < + e, and this is true for all e. □ 



Corollary 4.3. 



inf sup v m fi = inf sup v m ^k . 
0ee m>0 fc>i m >o 
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Proof: Consider an initial state z, and write a = inf fc sup m v m Qk(z). It is clear 
that a > inf eee sup m>0 v m> g(z). Now for each k > 1 there exists m(k), such that 
v m (k),e k ( z ) > a - 1/k, and we define the evaluation 9' k = J2^= m (k)+i^t-m(k)^t- 
We have TV(9' k ) = TV(6 k ) Y 0, so by proposition 14.21 we obtain that for all 

k— ¥oo 

evaluations 9, sup z , eGao , z \Ve(z') > limsup fc v 0m {k),k(z) > a. □ 

From lemma 14.11 and proposition 14.21 one can easily deduce the following 
corollary. 

Corollary 4.4. For all m > and z in Z , 

inf sup v m Qk{z) < liminf v 6 k(z) < lim sup v g k (z) < inf sup f m ,0* {z) . 

m<mo ™ k ^'—1 m>0 

And we can now conclude the proof the theorem 12. 5[ proceeding as in the 
proof of theorem 3.10 in Renault, 2011. 

End of the proof of theorem 12.51 Define d(z,z') = sup fe>1 \ v d k(z) — v d k(z')\ 
for all states z and z'. The space (Z, d) is now a pseudometric space (may not 
be Hausdorff). By assumption, there exists a finite set of indices I such that 
for all k > 1, there exists i in / satisfying d^v^Vi) < e. Consider now the set 
{(vi(z)) i( zj, z G Z}, it is a subset of the compact metric space [0, 1] / with the 
uniform distance, so it is itself precompact and we obtain the existence of a finite 
subset C of states in Z such that: 

\/z e Z,3ce C, Vi e I, \vi(z) - Vi(c)\ < e. 

We have obtained that for each e > 0, there exists a finite subset C of Z such that 
for every z in Z, there is c G C with c) < e. The pseudometric space (Z, d) is 
itself precompact. Equivalently, any sequence in Z admits a Cauchy subsequence 
for d. Notice that all value functions v 6 k are clearly 1-Lipschitz for d. 

Fix z in Z, and consider now the sequence of sets (G m (z)) m > . For all m, 
G m (z) C G m+1 (,2) so using the precompacity of (Z,d) it is not difficult to show 
(see, e.g. step 2 in the proof of theorem 3.7 in Renault, 2011) that (G m (z)) m > 
converges to G°°(z), in the sense that: 

> 0,3m > 0,Vz' G G°°{z),3z" G G m (z), d{z',z") < e. (2) 

We now use corollary 14.41 to conclude. We have for all m : 

inf sup Vgk(z') < liminf Vgk(z) < lim sup Vgk(z) < inf sup Vgk(z'). 

k>l z i e G m (z) k k ^z'eG^iz) 

Fix finally e > 0, and consider k > 1 and m > given by equation (j2]). Let 
z' in G°°(2;) be such that Vgk(z') > sup z , eG oo^Vgk(z') — e. Let z" in G m (2;) be 
such that d(z',z") < e. Since v 8 k is 1-Lipschitz for d, we obtain v d k(z") > 
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su P^'gg°°( 2 ) v e k ( z ')— 2e. Consequently, sup z / 6Gm ^ v 8 k(z') > sup z , €G00 ^ v d k(z')—2e 
for all k, so 

inf sup Vgk(z') > inf sup u^fc(z') — 2e. 

We obtain liminf^! VQk(z) > \imsup k>l v e k(z) — 2e, and so (^(z))^ converges. 
Since (Z, d) is precompact and all Vgk are 1-Lipschitz, the convergence is uniform. 
□ 

5 An open question 

We know since Lehrer and Sorin (1992) that the uniform convergence of the 
Cesaro values (i%) n >i is equivalent to the uniform convergence of the discounted 
values (v\)xeto,i]- Example 13.21 shows that is possible to have no uniform conver- 
gence of the Cesaro values (or equivalently of the discounted values) but uniform 
convergence for a particular sequence of evaluations with vanishing TV. Could it 
be the case that the Cesaro values and the discounted values have the following 
"universal" property ? 

Assuming uniform convergence of the Cesaro values, do we have general uni- 
form convergence of the value functions, i.e. is it true that (vgk)k uniformly 
converges for every sequence of evaluations (0 )&>i such that TV{6 k ) > ? 

k— ¥00 

The above property is true in case of an uncontrolled problem (zero-player), 
i.e. when the transition F is single-valued. 

Proposition 5.1. For an uncontrolled problem, the uniform convergence of the 
Cesaro values implies the general uniform convergence of the value functions: 

> 0,3a > O,V0 G 6 s.t. TV (6) < a, \\v B -v*\\ < e. 

Proof: Fix e > 0. By assumption there exists N such that for all states z in Z, 
\vn(z) — v *{ z )\ ^ £ - Consider an arbitrary evaluation 9 and an initial state ^o- 
For each positive t we denote by z t the state reached from zq in t stages, we have 
ve{zo) = T^i^tr(zt) and v*(z ) = v*(z t ) for all t. 

Divide the set of stages into consecutive blocks of length N: B° = {1, N},..., 
B m = {mN + 1, (m + 1)N},... Denote by 9{m) the mean of 9 over B m , we 
have Ylm=oN9( m ) = 1- We also write f{m) for the mean jj^2 teB ™ r(z t ). We 
have f(m) = v^{z m ^), so \f(m) — v*(z )\ < e for all m. 

Computing payoffs by blocks, we have 

m=0te_B m 
00 00 

= Y Yl ( 0t ~ K m )) r {z t ) + Y N9(m)f(m). 

m=0t£B m m=0 
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So we obtain: 

oo oo 

v e (z ) - v*(z ) = J2J2&- e( m )X z t) + E N0{m){f{m) - v*(z )), 

m=0teB m m=0 

and 

oo 

\v (z o )-v*(z o )\ <J2 N J2 \8t + i-0t\+e<NTV(e)+e. 

m=0 t£B m 

If TV (9) < §, we get \v e (z ) - v*(z )\ < 2e, hence the result. □ 
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